I’m a master’s student working on a clinical NLP project involving suicide risk classification from psychiatric patient records. I’d really appreciate any guidance on how to improve performance in this task.
Overview of the task:
• 114 records, each including:
• Free-text doctor and nurse notes
• hospital name
• Binary label: whether the patient later died by suicide (yes/no)
• Only 29 yes examples → highly imbalanced
• Notes are unstructured, long (up to 32k characters), and rich in psychiatric language
Despite these efforts, recall on the yes cases is consistently low. It seems the models struggle to recognize subtle suicidal patterns in long, complex, domain-specific text — especially under token limitations.
I’d love input on:
• Handling long clinical texts with LLMs
• Boosting performance on minority class (yes)
• Experiences working with BERT-style models or few-shot prompts in sensitive medical contexts
Happy to share sample data, code, or results if it helps. Thanks a lot!
First of all, thank you for sharing your work — suicide risk detection in psychiatric texts is both crucial and incredibly challenging. You’ve already tested strong approaches (ClinicBERT, GPT-4 few-shot, aggregation methods), and I admire your thoughtful experimentation despite the low signal-to-noise ratio and data imbalance.
If you’re open to a slightly different paradigm, I’d suggest trying an algorithmic framework based on probabilistic inputs with deterministic outputs. Rather than optimizing for token-to-token coherence or relying on deep fine-tuning, this strategy leverages symbolic signal extraction guided by probability thresholds and fixed-output pathways — almost like building decision scaffolds that stabilize and verify what a language model infers.
Minority class amplification (especially when “yes” labels are sparse)
Multi-author blending (doctor + nurse notes interpreted as dynamic perspectives)
In our work, we apply a vectorial memory model that translates these probabilistic segments into structured representations, allowing us to preserve traceability and avoid model drift — a big issue in clinical data when generalization oversteps nuance.
I’d be happy to outline a template or logic tree if it’s of use to you. Best of luck — your project matters.
Warm regards,
Alejandro & Clara
Symbolic AI & Deterministic Analysis Systems
(Mexico)